Fall back to control connection when host pools are empty by dkropachev · Pull Request #722 · scylladb/python-driver

dkropachev · 2026-02-23T16:01:18Z

Summary

When all hosts are marked IGNORED by the load-balancing policy (e.g. WhiteListRoundRobinPolicy with a NAT address not known to the cluster), no connection pools are created. Instead of raising NoHostAvailable on Session.connect(), the driver now logs a warning and falls back to executing queries on the already-established control connection.
Adds ResponseFuture._query_control_connection() method that borrows the control connection directly when session._pools is empty.
Adds integration test reproducing the exact scenario from the issue (connect via unadvertised NAT proxy with whitelist policy).

Fixes: #720

Test plan

Unit tests pass (pytest tests/unit/)
Integration test: SCYLLA_VERSION="release:2025.2" uv run pytest tests/integration/standard/test_public_address.py -s

When all hosts are marked IGNORED by the load-balancing policy (e.g. WhiteListRoundRobinPolicy with a NAT address not known to the cluster), no connection pools are created. Instead of raising NoHostAvailable on Session.connect(), log a warning and fall back to executing queries on the already-established control connection. Fixes: scylladb#720

sylwiaszunejko · 2026-02-24T08:48:44Z

@dkropachev I may be misunderstanding your approach, so I’d appreciate some clarification.

From what I see, the bug described in the issue started occurring in version 3.29.8, while everything worked correctly in 3.29.7. Because of that, I’m not sure how this PR would be reverting a newly introduced bug.

WhiteListRoundRobinPolicy with a NAT address not known to the cluster

This seems to be a wrong configuration. In such a case, I’m not sure whether introducing a fallback is the right solution.

In the issue the node address was properly recognized before the version upgrade, so I don't think we need fallback, but rather we need to find out where the introduced bug is. I will try to investigate this, I asked for more information in the issue (like logs).

Lorak-mmk · 2026-02-24T09:07:57Z

From what I see, the bug described in the issue started occurring in version 3.29.8, while everything worked correctly in 3.29.7. Because of that, I’m not sure how this PR would be reverting a newly introduced bug.
This seems to be a wrong configuration. In such a case, I’m not sure whether introducing a fallback is the right solution.
In the issue the node address was properly recognized before the version upgrade, so I don't think we need fallback, but rather we need to find out where the introduced bug is.

+1, fallback to CC is definitely not the right solution to anything.

dkropachev · 2026-02-24T09:15:08Z

What happens is the following:

User have tcp proxy for a node
This proxy is not present neither in broadcast_rcp_address nor in rcp_address
User opens a driver session targeting that tcp proxy
Driver fails to open any connection for any node pool because information it pulls from the system.local and system.peers points to the addresses that are unreachable.

What driver was doing before #623 is the following:

Connect to the cluster via endpoint
Pull system.local and system.peers merging them together into single list, while it is merging it would copy-over endpoint of the node it is currently connected to, to preserve way it accesses the node.
Due to the endpoint was copied-over driver was able to create one node-pool for the node cc was connected to.

Now, please, throw in ideas on how to fix this scenario ?

Creating node-pool using same endpoint is bad idea because in some cases, like Private-Link, driver is pointed to the lb that lands connection to a random node, so while driver thinking that it connects to the same node, it actually reaching out different nodes.

Lorak-mmk · 2026-02-24T09:21:41Z

Creating node-pool using same endpoint is bad idea because in some cases, like Private-Link, driver is pointed to the lb that lands connection to a random node, so while driver thinking that it connects to the same node, it actually reaching out different nodes.

Do we need to consider Private Link here? When we use it, driver will know about this and can behave differently.
Imo creating the pool using the address used to create the CC (as was done in previous version) makes sense - just don't do this for Private Link.

dkropachev · 2026-02-24T10:09:20Z

Creating node-pool using same endpoint is bad idea because in some cases, like Private-Link, driver is pointed to the lb that lands connection to a random node, so while driver thinking that it connects to the same node, it actually reaching out different nodes.

Do we need to consider Private Link here? When we use it, driver will know about this and can behave differently. Imo creating the pool using the address used to create the CC (as was done in previous version) makes sense - just don't do this for Private Link.

We need to consider that when user points driver to something that is not a legit cluster node or entry point to the cluster, it could be anything, it could be a simple tcp proxy and one node load balancer, or cluster-wide load balancer.

In the scenario when it is anything but a legit entry point we need to make sure that driver doesn't missbehave, but still can execute some queries.
Private link here only as an example.

dkropachev force-pushed the fix/control-connection-fallback-720 branch from 498b10c to 39ae69c Compare February 23, 2026 17:37

dkropachev force-pushed the fix/control-connection-fallback-720 branch from 39ae69c to 5a24812 Compare February 23, 2026 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fall back to control connection when host pools are empty#722

Fall back to control connection when host pools are empty#722
dkropachev wants to merge 1 commit intoscylladb:masterfrom
dkropachev:fix/control-connection-fallback-720

dkropachev commented Feb 23, 2026 •

edited

Loading

Uh oh!

sylwiaszunejko commented Feb 24, 2026

Uh oh!

Lorak-mmk commented Feb 24, 2026

Uh oh!

dkropachev commented Feb 24, 2026 •

edited

Loading

Uh oh!

Lorak-mmk commented Feb 24, 2026

Uh oh!

dkropachev commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

dkropachev commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

sylwiaszunejko commented Feb 24, 2026

Uh oh!

Lorak-mmk commented Feb 24, 2026

Uh oh!

dkropachev commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorak-mmk commented Feb 24, 2026

Uh oh!

dkropachev commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dkropachev commented Feb 23, 2026 •

edited

Loading

dkropachev commented Feb 24, 2026 •

edited

Loading

dkropachev commented Feb 24, 2026 •

edited

Loading